Beating Atari with Natural Language Guided Reinforcement Learning

نویسندگان

  • Russell Kaplan
  • Chris Sauer
  • Alexander Sosa
چکیده

We introduce the first deep reinforcement learning agent that learns to beat Atari games with the aid of natural language instructions. The agent uses a multimodal embedding between environment observations and natural language to self-monitor progress through a list of English instructions, granting itself additional reward for completing instructions in addition to increasing the game score. Our agent significantly outperforms Deep-Q Networks, Asynchronous Advantage Actor-Critic (A3C) agents, and the best agents posted to OpenAI Gym [4] on what is often considered the hardest Atari 2600 environment [2]: MONTEZUMA’S REVENGE. Videos of Trained MONTEZUMA’S REVENGE Agents: Our Best Current Model. Score 3500. Best Model Currently on OpenAI Gym. Score 2500. Standard A3C Agent Fails to Learn. Score 0. Figure 1: Left: an agent exploring the first room of MONTEZUMA’S REVENGE. Right: an example list of natural language instructions one might give the agent. The agent grants itself an additional reward after completing the current instruction. “Completion” is learned by training a generalized multimodal embedding between game images and text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Perform Physics Experiments via Deep Reinforcement Learning

When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit as a scientist performing experiments to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman p...

متن کامل

An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

Deep reinforcement learning methods have shown tremendous success in a large variety tasks, such as Go [Silver et al., 2016], Atari [Mnih et al., 2013], and continuous control [Lillicrap et al., 2015, Schulman et al., 2015]. Policy gradient methods [Williams, 1992] is an important family of methods in model-free reinforcement learning, and the current state-of-the-art policy gradient methods ar...

متن کامل

Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning

There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of RL tasks, from Atari games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning, that learn to play from experience with minimal knowledge of the specific domain of interest. In this work, we will investigate the performance of these me...

متن کامل

Deep Reinforcement Learning With Macro-Actions

Deep reinforcement learning has been shown to be a powerful framework for learning policies from complex high-dimensional sensory inputs to actions in complex tasks, such as the Atari domain. In this paper, we explore output representation modeling in the form of temporal abstraction to improve convergence and reliability of deep reinforcement learning approaches. We concentrate on macro-action...

متن کامل

Atari Games and Intel Processors

The asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage ActorCritic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep reinforcement learning often deals with interpreting visual information, a large part of the train and inference time is spent performing convolutions. In this work we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1704.05539  شماره 

صفحات  -

تاریخ انتشار 2017